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Abstract 

Background: Biochemical systems with relatively low numbers of components must be simulated stochastically in 
order to capture their inherent noise. Although there has recently been considerable work on discrete stochastic 
solvers, there is still a need for numerical methods that are both fast and accurate. The Bulirsch-Stoer method is an 
established method for solving ordinary differential equations that possesses both of these qualities. 

Results: In this paper, we present the Stochastic Bulirsch-Stoer method, a new numerical method for simulating 
discrete chemical reaction systems, inspired by its deterministic counterpart. It is able to achieve an excellent 
efficiency due to the fact that it is based on an approach with high deterministic order, allowing for larger stepsizes 
and leading to fast simulations. We compare it to the Euler r-leap, as well as two more recent r-leap methods, on a 
number of example problems, and find that as well as being very accurate, our method is the most robust, in terms of 
efficiency, of all the methods considered in this paper. The problems it is most suited for are those with increased 
populations that would be too slow to simulate using Gillespie's stochastic simulation algorithm. For such problems, it 
is likely to achieve higher weak order in the moments. 

Conclusions: The Stochastic Bulirsch-Stoer method is a novel stochastic solver that can be used for fast and accurate 
simulations. Crucially, compared to other similar methods, it better retains its high accuracy when the timesteps are 
increased. Thus the Stochastic Bulirsch-Stoer method is both computationally efficient and robust. These are key 
properties for any stochastic numerical method, as they must typically run many thousands of simulations. 

Keywords: Stochastic simulation, Discrete stochastic methods, Bulirsch-Stoer, r-leap, High-order methods 



Background 

Microscopic processes with few interacting components 
can have considerable effects at the macroscopic scale 
[1-3]. Stochasticity is a defining property of these pro- 
cesses, which can have so few component particles 
that random fluctuations dominate their behaviour [4,5]. 
Stochastic simulation methods take proper account of 
these fluctuations, as opposed to deterministic methods 
that assume a system does not deviate from its mean 
behaviour [6]; although deterministic methods can often 
be useful for an approximate description of the dynamics 
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of a system, their results are not always representative 
[7,8]. 

A common stochastic modelling approach is to consider 
the system as a continuous-time Markov jump process 
[9]. The stochastic simulation algorithm (SSA) of Gillespie 
[10] is a simple and exact method for generating Markov 
paths. However, because it keeps track of each reaction, it 
can be too computationally costly for more complex sys- 
tems or those with frequent reactions. Many approximate 
methods have since been developed, which use similar 
principles as the SSA but group many reactions into a sin- 
gle calculation, reducing computational time (for a recent 
review, see [11]). 

The first of these is commonly called the Euler or 
Poisson r-leap [12]; it corresponds to the Euler method 
for ordinary differential equations (ODEs), and samples 
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a Poisson random variable at each step. The original 
stepsize selection procedure has since been modified to 
improve accuracy [13,14]. To deal with issues of nega- 
tive populations, Tian and Burrage [15] and Chatterjee 
et al [16] introduced the binomial r-leap, which samples 
a binomial random variable at each step. In addition, a 
newer binomial r-leap [17] and a multinomial r-leap [18] 
have since been proposed. 

In his seminal paper, Gillespie also proposed the mid- 
point r-leap, a higher-order method that allows for 
larger timesteps by reducing the inherent bias of the 
r-leap [12]. More recently, other higher-order methods 
have been developed, such as the unbiased r-leap [19], 
random-corrected r-leap [20], # -trapezoidal r-leap [21], 
and extrapolated r-leap [22]. Rather than focussing on 
adaptively optimising the timestep to reduce processing 
time, these methods instead improve their order of accu- 
racy so that they find more accurate results for a given 
stepsize. Because of this, they can use larger timesteps for 
a desired error level, reducing processing time. 

In this paper, we introduce a new adaptive-stepsize 
method for simulating discrete Markov paths, which we 
call the Stochastic Bulirsh-Stoer (SBS) method. This is 
inspired by the deterministic method of the same name, 
a very accurate method for solving ODEs, based on 
Richardson extrapolation. Its high accuracy due to extrap- 
olation, and its ability to adaptively maximise the timestep 
make the Bulirsch-Stoer method one of the most power- 
ful ODE solvers. Possessing these same advantages, our 
SBS method is a very efficient and accurate new discrete 
stochastic numerical method. 

The SBS calculates several approximations for the 
expected number of reactions occurring per timestep r 
using stepsizes r/2, r/4, and so on, and extrapolates these 
to arrive at a very accurate estimate; the state of the sys- 
tem is then found by sampling a Poisson distribution with 
this parameter. The SBS is in some ways similar to the 
extrapolated r-leap methods we proposed in a previous 
paper [22]. These involve running simulations with a r- 
leap method of choice over the full time period of interest, 
and then taking moments and extrapolating them. The 
SBS is also based on extrapolation, but the extrapolation 
is carried out inside each timestep, rather than at the end 
of the simulation, allowing r to be optimised at each step. 

Overview of stochastic methods 

We start with a chemical system of N species and M 
reactions, interacting in a fixed volume Q at constant 
temperature, that is both well-stirred and homogeneous. 
Thus we assume that individual molecules undergo both 
reactive collisions and non-reactive ones, and the latter 
is more frequent than the former, mixing the molecules 
thoroughly [11]. Individual molecules are not tracked, 
rather it is their total numbers that we are interested in. 



These are stored in an N x 1 state vector, x = X(£) = 
(Xl, . . .,X/v) r , that contains the integer number of each 
type of molecule at some time t. Reactions are represented 
by an N x M matrix consisting of stoichiometric vec- 
tors Vj = (vy, . . . , VNj) T >j = 1, . . . , M, which dictate how 
each reaction changes the system state, and anMxl vec- 
tor of propensity functions <z/(x), where aj(x)dt gives the 
probabilities of each reaction occurring in an infinitesi- 
mal time interval dt. Together, these three variables fully 
characterise the chemical system as it evolves through 
time. In this paper, we adopt the following notation: a 
bold font variable refers to an N x 1 vector, e.g. X(£), and 
unless otherwise specified the indices i = 1, . . . , N and 
; = 1,...,M. 

A conceptually simple way of simulating problems using 
this framework is the SSA of Gillespie [10]. It steps 
along reaction-by-reaction, at each step calculating the 
(exponentially-distributed) time until the next reaction r, 
and the reaction / that will occur. The state vector is 
evolved in time according to the update equation 



M 



x n ^=x n + J2 v J K P K J = 



1 if;=/, 
0 otherwise, 



(i) 



i.e. only one reaction occurs over [t,t + r). Both r and / 
are sampled randomly as required by the stochastic nature 
of the process: 

Algorithm 1 SSA (Direct Method) 
With the system in state X n at time t n : 

1. Sample r\ and r<i from the unit-interval uniform 
distribution. 



2. Calculate time until next reaction r 



ao(X 1 



where a 0 (X n ) = Y^L\ aj(X n ). 

3. Next reaction / is the smallest integer such that 

r 2 a 0 (X n ) < Y!j=\ aj(X n ). 

4. Update X n as given by Eq. (1) and t n +\ = t n + r. 



The SSA is a statistically exact method for generating 
Monte Carlo paths. That is, a histogram built up from an 
infinite number of simulations of the SSA will be identi- 
cal to the true histogram of the system. It is the stochastic 
method of choice for many researchers, but it has one 
main limitation: as with other stochastic methods, many 
realisations (usually starting at 10 4 or 10 5 ) must be simu- 
lated to get a reasonable idea of the histogram shape, and 
SSA simulations can be slow depending on the problem. 

The r-leap [12] was introduced by Gillespie as a faster 
alternative to the SSA. It improves speed by evaluating 
many reactions in one step, which is typically much larger 
than that of the SSA. This allows the r-leap to be generally 



Szekely et al. BMC Systems Biology 201 4, 8:71 
http://www.biomedcentral.eom/1752-0509/8/71 



Page 3 of 18 



very fast compared to the SSA, but also means that it is not 
exact. Assuming r is sufficiently small so that the propen- 
sities do not change significantly during each step (the 
leap condition), the number of reactions occurring dur- 
ing [ t, t + r), Kj, is a Poisson random variable [11,12] with 
parameter aj(x)x. The simplest r-leap implementation is 
the Euler r-leap with fixed stepsize. 



Algorithm 2 Euler r-leap method 

With the system in state X n at time t n with given stepsize x: 

1 . Sample Kj = V (aj (X w ) r ) . 

2. Update X w+i = X n + Y!jL\ v j K j and Wi =t n + x. 



This method is effective but a little crude: unless sim- 
plicity is the most important consideration or r must be 
fixed, as is the case when used in the context of Richardson 
extrapolation [22], it is advisable to use more advanced 
r-leap methods. 

Adaptively changing r at each step can give even 
greater gains in speed, and this is easily introduced into 
Algorithm 2. A successful approach in current implemen- 
tations is to find r such that the mean and variance of the 
change in propensities over [ t, t + r) are bounded by some 
fraction e 1 of aj(X n ). The advantage of this is that r 
is controlled to stick more closely to the leap condition, 
ensuring better accuracy, while at the same time maxim- 
imising r for faster simulations. There have been several 
successive improvements for best selecting r [12-14], and 
these methods can achieve a very high efficiency. 

Methods 

Extrapolation 

Richardson extrapolation is a technique for increasing the 
order of accuracy of a numerical method by eliminating 
the leading error term(s) in its error expansion [23,24]. It 
involves numerically solving some deterministic function 
Y(t) at a given time T = nx using the same solver with dif- 
ferent stepsizes, where we define as an approximation 
to Y(T) at time T using stepsize r. Y(T) can be written as 

Y(D =Y r T + s g (T), 

where s g (r) is the error of the approximate solution com- 
pared to the true one. For a general numerical solver, Sg(x) 
can be written in terms of powers of the stepsize r: 

s g (r) = e kl x h + e, <2 x h + e h x h + . . . , (2) 

where the are constant vectors and depend only on the 

final time and k\ < ki < /<3, Eq. (2) tells us that this 

method has order of accuracy k\. 



Essentially, Richardson extrapolation employs polyno- 
mial extrapolation of approximations Y^, q = 1, 2, . . . and 
ri > X2 > . . ., to estimate Y^, i.e. the numerical solution 
in the limit of zero stepsize, which corresponds to Y(T) 
(Figure 1). Each successive extrapolation removes the next 
leading error term, which is the largest contribution to 
the error, thereby increasing the accuracy of the numerical 
solution and allowing it to better estimate Y(T). 

To demonstrate this, assume a numerical method with 
stepsize r has an error expansion of 

Y(T) -Y T T = eir + e 2 x 2 + 0(x 3 ) 

For instance, the well-known Euler method for solving 
ODEs has such an error expansion. Now instead of r , if we 
use a stepsize r/2, the error expansion is 

2 

Y(T)-Y r / 2 = ei T -+e 2 X —+0(T i ) (3) 

z 4 

We can take Y^ r/2 = 2Y^ /2 - Y X T , giving 
r 2 

Y (D - Y T / /2 = -e 2 — + G(x 3 ) (4) 

The leading error term has been removed, resulting in 
a higher-order approximation. This can be repeated to 
obtain an even higher order of accuracy by using more ini- 
tial approximations Y^ , . . . Y^, where q can be any integer 
and x\ > X2 > . . . x q . We define Yj' Tq as the extrapolated 
solution using initial approximations Y^, . . . Yj?. The eas- 
iest way of visualising this is to build up a Neville table 
(also called a Romberg table) from the initial approxima- 
tions (Table 1). 

The first column of the table contains the initial numer- 
ical approximations. These are then extrapolated to find 
the next column, and so on. For instance, with three initial 
solutions Y* T , Y T / 2 , Y T / 4 , then Y T / /4 = f Y T / 2 ' r/4 - \ Y T T r/2 
(this is easily calculated by first writing down a similar 
formula to Eq. (3) for Y^/ 4 , then one similar to Eq. (4) for 
yr/2,r/4^ more for Y^ 4 ). At each subsequent 

column, the next leading error term is cancelled, giving a 
yet higher-order solution. The correct coefficients to cal- 
culate each new term of the Neville table can be found 
from 

Y T 1-r> T 1 = P *T ~ X T 

T " p k «~\ 

where p — x q - r /x q - r +i and k q is the order of the solution 
at column q (c.f. Eq. (2)), and r = 1. This can 
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be generalised to any order method with any appropriate 
error expansion. The only condition for extrapolation is 
existence of an error expansion of the form in Eq. (2). 

Bulirsch-Stoer method 

The Bulirsch-Stoer method is an accurate ODE solver 
based on Richardson extrapolation [25,26]. A Neville table 
is built by repeated extrapolation of a set of initial approx- 
imations with stepsizes that are different subintervals of a 
larger overall step r, and is then used to find a very accu- 
rate solution. This happens inside each timestep, allow- 
ing r to be varied between steps. A modified midpoint 
method (MMP, Algorithm 3) is used to generate the initial 
approximations in the first column of the table. This lends 
itself well to an extrapolation framework, as the MMP 
subdivides each step r into h substeps f = x/h. Further- 
more, crucially, the error expansion of the MMP contains 
only even powers of f , resulting in fast convergence [27]. 



Algorithm 3 Modified midpoint method (MMP; 
described in [28]) 

With f(t, Y(0) = ^f- and Y(0) = y 0 , assuming the 
system is in state Y n at time t m and a substep f = r /h: 

1. Setz 0 =Y n . 

2. Calculate first intermediate stage 
zi = z 0 + if(t n ,z 0 ). 

3. Calculate next intermediate stages 

z m+ i =z m -i+2Tf(t n +mT,z m ), m = l, ...,h-l. 

4. Update Y n+ i = \ (z% + + rf(t n + r, z%)) and 

tn+l = t n -\- T, 



We give a brief overview of the deterministic Bulirsch- 
Stoer method here; Ref. [28] has an excellent description 
of the algorithm, as well as a guide to its implementa- 
tion. At each step, a column of the Neville table, /<, in 



which we expect the approximate solutions to have con- 
verged, as well as a stepsize x are selected. The Neville 
table is then built up by running k MMPs, with stepsizes 
fi = r/2, ...,r q = z/n q , where n q = 2q, q = 1, 2, . . . , k 
and successively extrapolating the appropriate numerical 
approximations. The convergence of the solutions is eval- 
uated based on the internal consistency of the Neville 
table, that is, the difference between the most accurate 
solution in column k and that in column k — 1: from 
Table 1, this is AY(/c, k-l) = Y\ l,ik - Y? A . As successive 

initial approximations Y/ are added to the first column, 
the extrapolated results in each new column converge 
to the true solution and AY(/c, k — 1) shrinks. The final 
approximation at column k is acceptable if err^ < 1, where 
errk is a scaled version of AY(/c, k — 1) (see Appendix for 
more detail). If err^ > 1, the step is rejected and redone 
with r = |. 

In a practical implementation, the initial step tests over 
q=l i ... i k max , where k max is usually set as eight, in 
order to establish the k necessary to achieve the required 
accuracy and ensure the stepsize is reasonable; subse- 
quent steps then test for convergence only in columns 



Table 1 Neville table built from q initial approximations 
Y T j , . . . , Y T T q with order ki (first column) and extrapolated 
to find a solution of order k q/ that is y T j' Tq 



Order 




k 2 


k 3 ... k q 




Y? 














Approximate 


Yp 




T 7" 


solutions 




V T 2' r 3 


i „Ti ,t q 

T r 




Yp 




v ~Cq-2,Tq 






,t q 






\r r q 

T r 
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k — l,k and k + 1 [28]. Because of its accuracy, the 
steps taken by the Bulirsch-Stoer method can be rela- 
tively large compared to other numerical solvers, r is 
changed adaptively at each step, and is chosen to min- 
imise the amount of work done (i.e. function evalua- 
tions h + 1 of the MMP) per unit stepsize. In this 
way the Bulirsch-Stoer method adapts its order and 
stepsize to maximise both accuracy and computational 
efficiency. 

Stochastic Bulirsch-Stoer method 

The SBS method is based on its deterministic counter- 
part described in the previous section. There are some 
key issues that must be addressed in order to suc- 
cessfully adapt it into a stochastic method. The two 
most important ones are interlinked: first, what quantity 
should be calculated at each step, and second, how can 
stochasticity be introduced into the picture? The deter- 
ministic Bulirsch-Stoer method calculates X w+ i from X n 
using the MMP to find the intermediate stages over 
[t,t + z). However, stochasticity cannot simply be added 
to this scheme either inside or outside the MMP, as 
this would interfere with the extrapolation necessary for 
the Neville table. In order to update the state vector as 
in Eq. (1), we must find the number of reactions per 
step. 

Looking at the update formula for the trajectory of a 
jump Markov process [29], 



X„ +1 =X H + £v,Wr aj(X(t))dt) 

j—l \Jt n / 



(5) 



it is clear that the quantity we must calculate is 
// w+r a(X(t))dt, in order to then take a Poisson sample 
for the update (the r-leap method approximates this as 
a(X(t n ))r). Thus, rather than calculating X w+ i directly 
using the MMP, we need an accurate way to find the inte- 
gral of the propensity functions over each step. Proceed- 
ing in a somewhat similar way to Algorithm 3, we arrive 
at Algorithm 4: the intermediate stages are found using 
the MMP, and the propensities calculated at each stage. 
These intermediate propensities are then fed into a com- 
posite trapezoidal method to give an accurate estimate of 
the integral. 

An important point is that the intermediate stages 
are solved using the reaction rate equations (Steps 1, 
3, 5 of Algorithm 4), which give the expectation of the 
stochastic trajectory over each step provided both are 
started in state X n at time t n . Thus we find the expected 
using Romberg integration, and use this 
to sample a Poisson distribution in order to increment 
X n . This method is both extremely accurate at finding 
the mean and fully stochastic, that is each simulation 
gives a different stochastic realisation and the full prob- 



ability density can be found from a histogram of many 
simulations. 



Algorithm 4 Integration of propensities over each step 
With X(0) = xo, assuming the system is in state X n at time 
t n , and a substep i = r/h: 

1. Set zo = X n . 

2. Calculate initial propensity, a' 0 = a(zo). 

3. Calculate first intermediate stage z\ = Zq + f va' 0 . 

4. Calculate first intermediate propensity, a' x = a(z\). 

5. Calculate next intermediate stages 

z m+ \ = z m -\ + 2iva ! m , ra = 1, — 1. 

6. Calculate next intermediate propensities 



= a(z m +i), m = l,...,h-l. 



7. Calculate integral of propensities using the 
composite trapezoidal rule, 



h-l 



Aa T (t n ,t n + r) 



. + E 2 < 



+4 



m—l 



pt n +T 

/ a(X(t))dt 

Jt n 



8. Update t n +\ = t n + r. 



We have now arrived at the implementation of the SBS. 
First, we calculate Aa T( i(t n , t n + r), the expected integral 
of the propensities over [t n ,t n + r), using Algorithm 4 

with multiple stepsizes fi, ?2, We then extrapolate 

these using the Neville (Romberg) table to arrive at the 
extrapolated solutions Aa extr (t n , t n + r); this is known as 
Romberg integration. Once these are sufficiently accurate, 
we sample the number of reactions as 

M 

x„ +1 =x n + J2 V J V ( Aa j xtr ^ *» + r) ) • (6) 

This is our approximation to the underlying probability 
density function at each step. Combined with the extrapo- 
lation mechanism described previously and a way to adapt 
the stepsize, we have the full SBS method. 
The stepsize is chosen by calculating the quantity 



(—) 



l 

2(*-l)+l 



(7) 



where r is current timestep, is the hypothetical next 
timestep for Romberg table column k and S\ and S2 
are safety parameters, introduced in the next paragraph. 
Here err^ is the local error relative to a mixed tolerance, 
with order 0(r 2 ^ _1 ^ +1 ) (see Appendix). Its ideal value is 
exactly one: if it is any smaller than this the step could 
have been made bigger, and if it is any larger it means 
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our error bound is exceeded and the step must be redone 
using a smaller r. At each step, a candidate next timestep 
is selected for each Neville table column k. As we know 
some measure of the work done for each k (the number 
of function evaluations of the MMP), we can calculate 
the efficiency of each of the k candidate timesteps as the 
work per unit r. We then select the candidate that gives 
the highest efficiency. For brevity, we have left the full 
description and step-by-step implementation of the SBS 
(Algorithm 5) until the Appendix. 

The SBS uses several different parameters (see 
Algorithm 5), all of which have some effect on the results. 
S\ and S2 are both safety factors that resize the next 
timestep by some amount: the smaller they are, the 
smaller the timestep and the more accurate the solution. 
As always, however, there is a compromise between step- 
size and speed, so one must be careful to optimise the 
parameters for maximum efficiency. The same is also true 
for the vectors a to i, the absolute error tolerance, and r to /, 
the relative error tolerance. These are used to scale the 
error that is calculated from the internal consistency of 
the Romberg table. They are usually set fairly low: around 
10 -6 is common. There is an additional consideration 
with the SBS, namely that of the column of convergence, 
k. Even when the safety factors are set high (meaning 
larger timesteps), the SBS can achieve very high accuracy 
by simply doing another extrapolation, and going to a 
higher column. For this reason, the relationship between 
the safety factors and accuracy is not a direct one, and 
it is advisable to check the timesteps and column of 
convergence for each new set of parameters. 

Extension: SBS-DA 

There is an alternative scheme to Eq. (6) for finding the 
stochastic update to the state vector: this is the 'degree of 
advancement', or DA approach, and we call the resulting 
method the SBS-DA. Its focus is theMx 1 random process 
Zj(t),j = 1, . . . , My the number of times that each reaction 
occurs over [ 0, t) [30,31]. Zj(t) is related to the state vector 
X(f) by 



M 



X(f)=X(0) + £v ; -Z ; -(f). 



In fact, X(t) is uniquely determined by Z(t), where Z(t) 
is an M x 1 vector [31]. This allows us to use the DA 
approach to calculate the number of reactions per step, 
then return to the population approach to update the state 
vector, using 



M 



X(t + r) = X(f) + v i Z A t + 



(8) 



where we define Zj([t,t+ r)) as the number of reactions 
occurring over [t,t + r). Notice that Eq. (8) has the same 
form as Eqs. (1) and (6). In fact, Kj = Zj([ t,t + r)): in 
the case of the SSA, the timestep tends to be very small 
and only one reaction occurs, but for the r-leap and SBS 
it is much larger so more reactions can occur. Similarly to 
Eq. (5), we know that [32] 



aj(Z(s))ds 



(9) 



In order to find the update of the state vector, we must 
solve for the mean (and variance, see below) of Kj, and 
sample according to Eq. (9). The equations for the evo- 
lution of the mean and variance of Kj, iij(s) and Vj(s), 
respectively (where s runs only over one step [t,t + r)), 
can be derived from its master equation, and take the form 
[19] (see also [31]) 



dfij(s) 
ds 

dVjis) 
ds 



M 



= J2fjf ( X *)/V (*) + a i^n\ flj(fi) = 0, (10) 

dfij(s) 



= 2fjjVj(s) + 



ds 



Vj(t) = 0, (11) 



where s e[ t, t + r), X n is the value of the state vector at 



N 



the start of the step and fjf(X n ) = ^ 



i=l 



daj(X n ) 
dXi 



Vif*b) 



1,...,M are the elements of an M x M matrix (note that 
we only deal with its diagonal elements in the case of the 
variance). Eqs. (10) and (11) must be solved simultane- 
ously with initial conditions fij(t) = Vj(t) = 0 to find 
\ij{t + r) and Vj(t + t). It should be noted that they are 
only exact for systems with linear propensities. In the case 
of non-linear propensities, the moment equations contain 
higher moments and we obtain Eqs. (10) and (11) by a 
standard closure argument: we Taylor expand the propen- 
sities and truncate at first-order [19]. In fact, Eq. (11) is 
only necessary because Eq. (10) is not exact in the general 
case. For larger timesteps this may lead to a sizeable error, 
so we must approximate the true, Poisson, distribution of 
Kj with a Gaussian whose variance has been corrected. 
This leads to the update scheme 

Kj{tf Xtr {t + T),Vf Xtr {t + T)) 



if /^ r (£ + r)<10, 



/=1 



\N ^ xtr (t + r), ^^^+1))] if fif tr (t + r) > 10, 



which now replaces Eq. (6). Here [ ] denote rounding 
to the nearest integer and the value ten has been cho- 
sen heuristically as above this value a Poisson sample 
can be well represented by a Gaussian sample with the 
appropriate mean and variance. 
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This approach is somewhat similar to the unbiased 
r-leap [19]. The key difference is that Ref.[19] uses this 
scheme in the context of a fixed-stepsize r-leap method, 
basing the entire method around this scheme. In con- 
trast, the SBS-DA is grounded in the Bulirsch-Stoer 
method, which it uses for its stepsize selection and com- 
bination of MMP and Richardson extrapolation to find 
the Poisson increment. The DA approach is only one part 
of the whole SBS-DA, and is only used as an alternative to 
Eq. (6), in order to also find the variance of the number of 
reactions occurring over each step. 

The SBS and SBS-DA methods both calculate the 
parameter of the Poisson sample but they take different 
approaches to this. The two key differences are that (1) 
the SBS-DA attempts to correct using the variance of the 
sampled distribution in order to better approximate the 
true Poisson parameter when the stepsize is large, but (2) 
it sacrifices some of its performance because of the inher- 
ent inaccuracies of Eqs. (10) and (11) (see Results, Higher 
order of accuracy and robustness section). 

Results and discussion 

To illustrate their effectiveness, we apply the SBS and 
SBS-DA methods to four example problems of varying 
degrees of complexity. We compare them with the popular 
benchmark of the Euler r-leap method (TL; most recent 
formulation) [14], and we also selected two newer meth- 
ods that are intended to be representative of the most 
current, fastest and most accurate methods. These are the 
# -trapezoidal r-leap (TTTL) [21], which has two stages 
and weak order two, and the unbiased r-leap (UBTL) 
[19], which accurately estimates the mean and variance 
of the number of reactions that occur during one step. 
Although the authors of these methods have used fixed 
stepsizes in their works, we have implemented their meth- 
ods using the same r -adapting scheme as the Euler r-leap. 
This actually makes them more advanced than originally 
described, but we believe this ensures a fairer comparison 
with the SBS. 

We use four example problems: a simple chain decay, 
the Michaelis-Menten reactions, the Schlogl system and 
the mutually inhibiting enzyme system. All the methods 
we tested have parameters that can be varied: for the SBS 
methods these are r to /, <%,/, Si an d S 2 , and for the r-leap 
methods it is 6 (and 0 for the TTTL, which was always set 
as 0.55). In the former case, we chose to focus our atten- 
tion on Ytob as ^ plays a somewhat similar role to e in 
the latter ones, i.e. as a relative bound for the errors. For 
each system, we produced a plot of the 'histogram error' 
(see below) versus runtime for several values of r to i and e. 
We only varied these single parameters, listed in Table 2: 
the other parameters of the SBS were chosen to maximise 
the overlap between the runtimes of the five methods, 
and kept constant. This was solely to facilitate comparison 



between the different methods, and these values do not 
necessarily fall in the normally useful ranges of those 
parameters. In order to discriminate between the meth- 
ods, the plots could be used to choose a CPU time and 
check which method has the lowest error at that point, or 
to find which method takes less time to run for a set error 
level. 

The same number of simulations were run with all 
methods. We plotted probability density functions (PDFs) 
for each species and compared them to ones obtained 
from a reference set of 10 6 SSA simulations, all generated 
from histograms using identical bins. We defined the his- 
togram error as the L 1 distance between the probabilities 
of each method and the SSA in each bin. The runtime 
is the time taken to run a single simulation, obtained by 
dividing the total runtime by the number of simulations. 

We show the probability distributions of all the simula- 
tion methods, as well as plots of histogram error versus 
(single) runtime. We refer to the latter as efficiency' plots, 
as they clearly indicate some measure of computational 
efficiency. If a method is both fast and has low error, it is 
efficient: its points are concentrated towards the origin. In 
contrast, points to the top right indicate low efficiency (i.e. 
a slow and inaccurate method). 

Chain decay system 

We start with a simple test system that has linear propen- 
sity functions (i.e. aj(x) oc x). The system has three species 
that are converted into each other by the reactions 

X 1 % X 2 , ci = 1, 
X 2 % X 3 , c 2 = 1. 

The simulations were started in initial state X(0) = 
[ 10000, 1, 0] T and simulation time was T = 5. We ran 
5 x 10 5 simulations. The SBS safety factors were S\ = 
0.2, S 2 = 0.4, those for the SBS-DA were Si = 0.15, S 2 = 
0.2, and a to i = 10 -6 for both. Probability distributions 
of the simulation results are shown in Figure 2a for X\. 
For clarity, the figure shows only the results for the most 
and least accurate parameter values. The UBTL and SBS 
methods' PDFs both match the SSA very closely; the other 
methods are less accurate. This is quantified in Figure 2b: 
the UBTL returns the lowest errors, followed by the SBS- 
DA and SBS. This is not surprising: for linear systems, 
the UBTL (and SBS methods) are exact. For this system, 
taking into account all three chemical species, the SBS 
methods and the UBTL are the most efficient (Table 3). 
We have included the efficiency plots for all species in the 
Additional file, and we have defined a quantity to estimate 
the total measure of efficiency across all species; these are 
described in the Further comparisons section. 
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Table 2 Parameters varied for SBS, SBS-DA, TL, TTTL and UBTL for each test system in order from fastest to slowest 
(left to right in efficiency plots) 



Figure System Method Parameters (r f0 / for SBS methods, e for TL methods) 







SBS 


1CT 5 


5 x 10~ 6 


10 -6 


5 x IO" 7 


2.5 x 10~ 7 


10- 7 






SBS-DA 


10- 5 


5 x 10~ 6 


10~ 6 


5 x 10~ 7 


10- 7 


10- 8 


Figure 2 


Chain decay 


TL 


0.125 


0.1 


0.075 


0.05 


0.04 


0.03 






mL 


0.2 


0.15 


0.1 


0.075 


0.05 


0.04 






UBTL 


4 


2 


1 


0.8 


0.6 


0.5 






SBS 


10- 2 


io- 3 


10- 4 


10- 5 


1Q -6 


10- 7 






SBS-DA 


10- 3 


io- 4 


10- 5 


1Q -6 


10- 7 


10- 8 


Figure 3 


Michaelis-Menten 


TL 


0.2 


0.15 


0.1 


0.07 


0.05 


0.04 






mL 


0.3 


0.25 


0.2 


0.1 


0.075 


0.06 






UBTL 


7 


5 


3 


2 


1.5 


1 






SBS 


10~ 5 


8 x 10~ 6 


4 x 10~ 6 


10 -6 


5 x IO -7 


10- 7 






SBS-DA 


2 x 10~ 4 


1.5 x 10~ 4 


io- 4 


9 x 10~ 5 


8 x 10~ 5 


6 x 10~ 5 


Figure 4 


Schlogl 


TL 


0.046 


0.044 


0.042 


0.04 


0.0375 


0.035 






mL 


0.06 


0.056 


0.052 


0.048 


0.046 


0.044 






UBTL 


0.3 


0.28 


0.26 


0.24 


0.22 


0.2 






SBS 


10~ 5 


10 -6 


10~ 7 


10- 8 


10- 9 


10 -10 






SBS-DA 


10- 5 


10 -6 


10~ 7 


10- 8 


10- 9 


10 -10 


Figure 5 


Enzymes 


TL 


0.15 


0.1 


0.07 


0.05 


0.03 


0.02 






mL 


0.3 


0.2 


0.12 


0.08 


0.06 


0.04 






UBTL 


5 


3 


1.5 


0.8 


0.4 


0.3 



Michaelis-Menten system 

This is a well-known system in biochemistry and is often 
used to test computational simulations. It is a model of an 
enzyme (X2) catalysing the production of some molecule 
(X4). It consists of four chemical species undergoing the 
reactions 



Xi+X 2 


ci 


X3, 


c\ 


= 10" 


X3 


—> 


Xi+X 2 , 




= 0.5, 


X3 


— > 


X2 + X4, 


C3 


= 0.5. 



The initial state was X(0) =[ 1000, 200, 2000, 0] T and 
the simulation time was T = 10. We ran 10 6 simulations. 
The SBS safety factors were set as S\ = S2 = 0.35, and 
those of the SBS-DA as Si = S 2 = 0.33, with a toi = 10~ 6 
for both. The PDFs and efficiency plot for X\ are shown 
in Figure 3. The SBS, SBS-DA and TTTL all achieve 
high accuracy. The TTTL becomes more accurate than 
the SBS methods at longer runtimes, but the SBS meth- 
ods have the advantage at shorter runtimes. Thus when 
it is important to minimise runtime, the SBS methods 
are preferable. Overall, the SBS-DA has the highest effi- 



(a) 



0.04 



0.02 




120 



(b) 



10 u 



10 



10 



10 



** * 

□ 


A SBS 

SBS-DA 
★ TL 
□ TTTL 
+ UBTL 


□ 

□ 

> a - 

^ + Of A ^ J 
* + + 


□ 

+ 



0.05 



0.1 

runtime (sec) 



0.15 



0.2 



Figure 2 Chain decay system, (a) PDFs of X] generated from 5 x 1 0 5 simulations. Only the PDFs of the most and least accurate error parameters 
are shown, (b) Histogram error of each method as compared to the PDF ofX] simulated with the SSA. Parameters varied are listed in Table 2. 
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Table 3 Efficiencies of each method as calculated according to Eq. (1 2) for each test system 


(higher is better) 


System 


SBS 


SBS-DA 


TL 


TTTL 


UBTL 


Bin sizes (for species 1, 2, ... AO 


Chain decay 


14.7 


22.3 


0.4 


4.4 


14.1 


2,5,5 


Michaelis-Menten 


7.1 


11.7 


2.3 


7.7 


1.2 


5,5,5,5 


Schlogl 


19.1 


7.1 


18.8 


18.8 


0.3 


10 


Enzymes 


21.9 


50.5 


12.6 


6.3 


3.0 


100, 100,50,50,50,50,50,50 



Histogram bin sizes for each species are listed in order. 



ciency, with the TTTL second and SBS a close third 
(Table 3). 

Schlogl system 

The Schlogl system is useful as a test system that is both 
bimodal and non-linear, while at the same time being 
very simple. It is bimodal in species X with a high and a 
low stable state, although this is only the case for certain 
parameter combinations. It consists of the four reactions 



A + 2X 



3X, 



ci = 3 x 10" 



3X % A + 2X, c 2 = 10" 4 , 



B%X, 
X%B, 



c 3 = 10-' 
C4 = 3.5, 



and species A and B are held constant at 10 5 and 2 x 10 5 
units respectively. We used the initial condition X(0) = 
250, which is an intermediate value between the two stable 
states. The simulation time was T = 10, and we ran 10 5 
simulations for each method. The SBS safety factors were 
Si = S 2 = 0.05, those for the SBS-DA as Si = S 2 = 0.125, 
and a to i = 10 -6 for both. 

The PDFs and efficiencies of each method are shown 
in Figure 4 for X\. For this system, the TL is surprisingly 
accurate compared to the other methods. The SBS and 
TTTL have approximately the same efficiency as the TL, 
with the SBS-DA being somewhat less efficient and the 
UBTL the least (Table 3). 



Mutually inhibiting enzymes system 

This system has 8 chemical species and 12 reactions 
[33,34]. It represents the interactions of two enzymes, Ea 
and Eb, and their products, A and B, respectively. Each 
enzyme reacts with some substrate (that is not accounted 
for in the model) to create its product. These products 
then go on to inhibit the other enzyme. Thus, if initially 
there are more Ea or A, this reduces the chances of B 
being produced, and vice versa. This makes the system 
bistable in the products. This system is a good example 
of the double-negative feedback mechanism that is very 
common in cell biology. Here, however, we use a param- 
eter set that does not result in bistability. The reactions 
are 



E A 


— > 


E A +A, 


c\ = 15, 


E b 




E b +B, 


c 2 = 15, 


E a +B 


C 4 


E A B, 


c 3 = 5 x 10 -4 , C4 = 2, 


E A B+B 


C5^ 
C6 


EaB2, 


c 5 = 10- 3 ,c 6 = 6, 


A 


C7 


0, 


c 7 = 5, 


E B +A 


C 9 


EbA, 


c 8 = 5 x 10" 4 , c 9 = 2, 


E B A+A 


cio 
c\\ 


E B A 2 , 


cio = 10" 3 ,cn = 6, 


B 


CY2 


0, 


Ci2 = 5. 
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Figure 3 Michaelis-Menten system, (a) PDFs of X] generated from 1 0 6 simulations, and (b) histogram error of each method as compared to the 
PDF of X] simulated with the SSA. Parameters varied are listed in Table 2. 
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(a) 



0.025r 




SSA 

A SBS, r to| =10" 7 
SBS, r to| =10" ; 
D SBS-DA, r to| =6x10 _l 
SBS-DA, r to| =2x10"' 
• • * TL, e=0.035 
— * — TL, £=0.046 

□ TTTL, £=0.044 
— a — TTTL, £=0.06 
+ UBTL, £=0.2 
— I UBTL, £=0.3 



800 



(b) 



10 



b" 10 



10 



+ + + 4 






A SBS 




SBS-DA 




* TL 




TTTL 




+ UBTL 


0 A A 


0 





0.7 0.8 0.9 1 1.1 1.2 1.3 

runtime (sec) 



Figure 4 Schlogl system, (a) PDFs of X] generated from 1 0 5 simulations, and (b) histogram error of each method as compared to the PDF of X] 

simulated with the SSA. Parameters varied are listed in Table 2. 



The initial state was set to X(0) = [20000, 15000, 9500, 
9500, 2000, 500, 2000, 500] r , where X = [A,B,E A ,E B , 
EaB, EaB2, EbA, EbA2] t , and the system was simulated 
2 x 10 5 times for time T = 2. We used safety factors 
of Si = S 2 = 0.4 for the SBS and Si = 0.55, S 2 = 0.7 
for the SBS-DA, with a toi = 10~ 6 . The PDFs and effi- 
ciencies for X\ are shown in Figure 5; again the TL is 
unexpectedly efficient, with only the SBS and SBS-DA 
more efficient overall (see Table 3). At the longest run- 
times, both the TTTL and TL are more accurate than the 
SBS-DA and similar to the SBS. However, as runtime is 
decreased, the SBS remains very accurate whilst the TTTL 
and TL quickly lose accuracy, and for shorter runtimes 
the SBS-DA is also more accurate than them (Figure 5). 
Taking into consideration all eight species, it is, in fact, 
the SBS-DA that is most efficient, followed by the SBS 
(Table 3). 

Further comparisons 

All of our test systems have more than one species, and so 
far we have only presented results for X±. This can often 
be unrepresentative of the full picture. The chain decay 



system is a clear example of this. Additional file 1: Figure 
Al shows the efficiency plots for all three species. Only 
looking at X\ could lead one to think that the UBTL is 
the most efficient method for simulating this system. But 
including the other two species reveals that the SBS-DA 
is, in fact, the most efficient overall. This is important, 
because it is clear that factors such as linear/non-linear 
propensities, population size and stiffness all affect each 
reaction and species in a different way. Thus it is overall 
performance we are interested in. 

To overcome this problem, we use a way of quantify- 
ing the overall efficiency of each method over all species. 
This follows directly on from our previous definition of 
efficiency: low error and low runtime implies an efficient 
method, high error and high runtime implies an inefficient 
method, and a combination of the two, for instance high 
error but low runtime, clearly lies somewhere between the 
two. We define efficiency' rj as 

(sum(total error over all histogram bins and all species)) -1 

r] — . 

sum(single-simulation runtime over all error parameters) 

(12) 




SBS 

SBS-DA 

TL 



A A* 



0.1 



0.2 

runtime (sec) 



0.3 



0.4 



Figure 5 Mutually inhibiting enzymes system, (a) PDFs of X] generated from 2 x 1 0 5 simulations, (b) histogram error of each method as 
compared to the PDF of X] simulated with the SSA. Parameters varied are listed in Table 2. 
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This varies for each system, and is comparable only 
when the bins used to calculate errors are identical In 
other words, the values are a direct comparison of the effi- 
ciency of each method for each test system, but should not 
be used across different test systems. Table 3 compares 
the efficiencies of each simulation method for every test 
system. 

Additional file 1: Figures Al to A4 contain a full picture 
of our computational results for all four example sys- 
tems and all simulation methods. The overarching trend 
was the following: the SBS was very accurate, returning 
the lowest error in many cases. The TL was unexpect- 
edly efficient for some systems. The TTTL also achieved 
good efficiency for longer runtimes, but the SBS had a 
flatter efficiency curve than the TTTL, with the TTTL 
quickly losing accuracy at lower runtimes even when it 
was more accurate than the SBS at higher runtimes. A 
clear trend emerges: the SBS is a very accurate method. 
Moreover, it is the most efficient method we tested, main- 
taining its accuracy better at low runtimes than the other 
methods. 

SBS methods excel when we want a short runtime 
with high accuracy. In this case, we set the safety factors 
high, allowing large steps and a corresponding increase in 
extrapolations. This retains a high accuracy, whilst reduc- 
ing runtime because of the large timesteps. In contrast, 
when we allow a longer runtime, we set the safety fac- 
tors low, restricting the stepsize and removing the need 
for higher extrapolation. In many of our test examples, we 
have seen the SBS only using one extrapolation through- 
out the simulation. This is a waste of the extrapolation 
capability of the SBS, and it is no surprise that in these 
cases it is not the most efficient method, especially as the 
stepsize adaptation scheme adds some overhead to each 
step. 

Higher order of accuracy and robustness 

A thorough study of the order properties of r -leaping 
methods was first given by Rathinam et al [35], who 
showed that for linear reactions the Euler r-leap method 
is weak order one in the moments under the scaling 
r — >> 0. This analysis was extended by Li [36] to non-linear 
propensity functions by considering SDEs driven by Pois- 
son random measures (see also [37]). Li showed that the 
Euler r-leap method is precisely the Euler method applied 
to this SDE and hence inherits the properties of strong 
order half and weak order one. However, there are issues 
with using the scaling condition r -> 0 as the r-leap 
condition requires that J^*z/(X)t ^> 1. Anderson et al 
[38] overcame this scaling condition by considering order 
under a large volume scaling Q — »► oo. In this case, by 
letting X/Q = (9(1) and with r = Q~P,0 < 0 < 1, 
global strong and weak order convergence can be estab- 
lished. Hu et al [39] investigate these issues in greater 



detail through the use of rooted tree expansions of the 
local truncation errors for the moments and covariance, 
thus generalising the approach first applied to SDEs by 
Burrage and Burrage [40] . This analysis shows that while 
some r-leap methods may have higher order moments 
(for instance, the midpoint r-leap has order two moments 
for linear systems), their covariance is invariably of unit 
order, unless this is specially taken into consideration (as 
with the TTTL method, which has order two moments 
and covariance). As Hu et al [39] point out, these issues 
arise as a consequence of the differences between the 
infinitesimal generators for deterministic ODEs and jump 
processes. 

It is well-known that the Bulirsch-Stoer method has 
a high order of accuracy: this is the reason it is able 
to use large steps whilst still finding very accurate solu- 
tions. This is because of the Richardson extrapolation 
that is used at each step on the MMP solutions (which 
themselves have order two as well as an error expansion 
containing only even powers of f , resulting in very high 
order solutions with little work). In contrast, rather than 
the MMP solutions for X(t), the SBS instead extrapo- 
lates at each step the deterministic quantities Aa T i (t n , t n + 
r) (or the mean and variance of Kj given by Eqs. (10) 
and (11) in the case of the SBS -DA), calculated using 
the composite trapezoidal rule, which also has a known 
error expansion. Thus the extrapolation is performed 
on a deterministic variable: the mean of the Poisson 
update. 

We investigated the behaviour of the SBS and SBS -DA 
methods on two simple systems: first, the linear system 
X -> 2X, X(0) = 1000, and second, the non-linear 
system X + Y -> 0, X(0) = Y(0) = 10000. As the 
SBS changes both stepsize and Romberg table column k 
(i.e. order of accuracy) adaptively, we used a restricted 
version, which had both a fixed stepsize and k. We ran 
simulations with both SBS and SBS -DA, for k = 1 and 
k = 2, that is no extrapolation and one extrapolation, 
respectively. In addition, we also used the TL and TTTL 
methods for comparison, as they have known weak orders 
of accuracy (one and two, respectively). The gradients 
of the errors for the different methods are computed 
based on a linear least-squares regression of the data 
points. 

We find that both SBS and SBS -DA can have high 
weak order in the mean (in certain cases, such as for 
large timesteps and populations; this is discussed below). 
For the linear system, both methods have weak order 
approximately two and four in the mean for k = 1 and 
k = 2, respectively (Figure 6). However, there is a differ- 
ence in behaviour between the SBS and SBS -DA for the 
non-linear system (Figure 7). Here, the SBS -DA is lim- 
ited to at most weak order two in the mean, even when 
extrapolated. In contrast, the SBS is limited to at most 
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Figure 6 Order of accuracy for linear system. Error versus stepsizefor (a) mean, and (b) variance of the linear system X -> 2X, with c = 0.2, 
X(0) = 1 000, 7 =12. The gradients of linear regression lines fitted to the points are shown in the legend. 



weak order four when extrapolated. This shows the limi- 
tations of Eqs. (10) and (11): for non-linear systems, they 
limit the order of accuracy of the mean of the SBS-DA 
to two. This is not the case for linear systems, as here 
Eq. (10) is exact, so the order of the SBS and SBS-DA is 
identical. 

The clear message we can take from Figures 6 and 7 
is that the SBS does behave as if it had higher weak 
order in the moments, and this order increases as the 
Romberg table column (that is, number of extrapola- 
tions) is increased. However, we cannot tell whether this 
trend continues to higher extrapolations as these are so 
accurate that Monte Carlo error interferes with our abil- 
ity to reveal the weak order. On the other hand, the 
SBS-DA behaves as if it has at most weak order two in 
the mean for non-linear systems, but this restriction in 
weak order is compensated for by the use of an appro- 
priate Gaussian sample when the Poisson parameter is 
large, and generally it is similarly, or even more, efficient 



than the SBS. However, in neither case does extrapo- 
lation increase the weak order of the variance beyond 
one. 

Thus one can legitimately ask whether our approach 
offers any advantage over, for example, the TTTL method, 
which has weak order two in both the moments and the 
variance. This can be addressed by perusal of Figures 2, 
3, 4 and 5, where we compare the distances of the PDFs 
of the numerical methods and the exact solution (as com- 
puted by the SSA) as a function of the runtime. Of the 
methods tested the SBS appears to be the most robust 
and efficient, even though the TTTL has weak order two 
in the variance. It is the criteria of efficiency and robust- 
ness that are the most important properties of any good 
numerical method. We claim that these properties are 
intrinsic to the SBS along with its ability to adaptively 
select the timestep and number of extrapolations to carry 
out, thus maximising efficiency whilst keeping accuracy 
high. 
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Figure 7 Order of accuracy for non-linear system. Error versus stepsizefor (a) mean, and (b) variance of the non-linear system X + Y -> 0, with 
c = 10 _5 ,X(0) =[10000, 10000] 7 , T = 12. The gradients of linear regression lines fitted to the points are shown in the legend. The first three points 
of the SBS with k = 2 are omitted from the regression line, as they are clearly affected by Monte Carlo error. 
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Now we discuss the order of accuracy behaviour of our 
methods. First of all, we are not claiming that they have 
high weak order uniformly in the stepsize, and we have no 
such proof apart from in the linear case when the higher 
order is inherited directly from the underlying determin- 
istic extrapolation methods. However, this statement gives 
us a key insight into considering the behaviour of numeri- 
cal methods when applied to SDEs with small noise of the 
form 



dX(t) =f(t, X(t))dt+sg(t, X(t))dW(t), 



X(t 0 ) = x 0 , 
(13) 



where e > 0 is a small-noise term. It is well-known that 
Langevin SDEs represent an intermediate regime between 
discrete stochastic chemical kinetics and the determinis- 
tic regime, arising as the number of molecules X in the 
system increases. In particular, s behaves as -j= [11]. For 
such systems, Milstein and Tretyakov [42] showed that the 
global weak order of numerical methods to solve the above 
SDE has the general form 0(r p + r q s r ), where q < p. 
When noise is ignored {s = 0), the SDE becomes an ODE 
and the weak order of its approximate solution is just the 
deterministic order term 0(z p ). Milstein and Tretyakov 
[41] also performed an analysis in the strong sense, and 
again found the general form of the global strong error 
to be 0(t p + r q s r ), q < p. In addition, Buckwar et al. 
[43] also examined small-noise SDEs in a strong sense for 
some well-known classes of Runge-Kutta methods. The 
implication of the extra term in the stochastic order is 
that although the underlying deterministic order of the 
method may be high, the stochastic order is restricted by 
the noise term. However, when the noise is small, this term 
will also become small, thus allowing the stochastic order 
to increase, possibly even up to the deterministic order 
0(z p ). This is also the case if the stepsize is large. In fact, 
it occurs over a range of values of r and s: it is trivial to see 
that the condition for the deterministic term to dominate 

r 

is r » eP-4. 

Now, a standard way of mathematically investigating 
discrete stochastic methods is to analyse SDEs with jumps; 
this is the approach of Li [36]. In particular, Li [36] 
shows that the Euler discretisation of such an SDE is the 
Euler r-leap method. Our hypothesis is that the analy- 
sis of Milstein and Tretyakov [42] is also applicable to 
SDEs with jumps; this then tells us something about the 
behaviour of discrete stochastic methods when noise lev- 
els are medium or small. A small-noise analysis similar to 
that of Milstein and Tretyakov [42] for the SBS is beyond 
the scope of this paper but we postulate that there is a 
small-noise error expansion in both r and s for the SBS 
method. The SBS is most useful when applied to systems 
with relatively larger biochemical populations (thus small 
noise), as it is in these cases that the SSA is prohibitively 



slow and an approximate method is necessary. Combined 
with the fact that the SBS often uses large stepsizes, this 
implies that in many systems of interest, the global weak 
order of the SBS may, in fact, not be far from its deter- 
ministic (high) order. This also explains the behaviour 
of the SBS in our numerical tests, where the timesteps 
were large and the populations moderate (implying 
moderate noise), thus making it likely that the condi- 
tion for the deterministic order term to dominate was 
met. 

Implementation issues 

The speed of the SBS is due to the large steps it takes 
compared to other solvers. We compare the stepsizes for 
all five methods we used in this paper on the Michaelis- 
Menten system (Figure 8). Clearly, the stepsizes are influ- 
enced by our choice of error parameters. We controlled 
for this by using parameters that gave similar (or as close 
as possible) error levels, regardless of runtime. Figure 8 
shows that the largest steps were taken by the SBS meth- 
ods and the UBTL, with the stepsizes being very similar, 
followed by the other two methods. This is not very 
surprising: the UBTL is in some respects similar to the 
SBS -DA, in that it also finds very accurate solutions for 
the moments of Kj at each step. However, the stepsize is 
controlled using a completely different mechanism, so it is 
interesting to see that both employ a similar stepsize for a 
similar error level in this case. 

One peculiarity of the SBS is that it can settle into one of 
several different 'regimes': because it builds the Romberg 
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TL 
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Figure 8 Stepsizes over time. Evolution in time of stepsizes of all 
the stochastic solvers we have compared in the case of the Michaelis- 
Menten system. The largest steps are taken by the SBS, SBS-DA and 
unbiased r-leap, then the ^-trapezoidal r-leap, and finally the Euler 
r-leap. Parameters used were:Si = S2 = 0.8, o to i = 10 -6 , r to i = 10 -4 
for the SBS methods, and 6 = 0.04, 0.1 , 1 for the Euler, ^-trapezoidal 
and unbiased r-leap methods, respectively. 
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table adaptively, it can achieve the same accuracy using a 
larger step and higher extrapolation (i.e. higher Romberg 
table column) or smaller step and lower extrapolation. 
The regime into which the particular simulation falls is 
strongly influenced by the initial stepsize r(0), but often 
changes mid-simulation. For instance, a smaller r(0) is 
more likely to fall into the smaller- r (and lower extrapola- 
tion) regime, and vice versa. Figure 9 shows how r changes 
with time in chain decay system simulations using sev- 
eral different r(0). It is clear that there are two regimes, 
one high-r and one low-r. When r(0) is very small, r 
settles down to the low regime, and only the second 
Romberg table column is used; as r (0) is increased, r set- 
tles in the high regime and uses the third column. When 
r(0) = 1, r initially enters an even higher- r regime using 
the fourth column, but eventually settles into the high 
regime with the third column. In practice, it is advis- 
able to bear this in mind, and choose r(0) accordingly: 
low-r, low-column simulations are more computationally 
expensive and if the same accuracy can be achieved with 
a larger timestep then efficiency can be improved even 
further. 

There are two distinct approaches to determining r(0): 
first, as described previously, we could set r(0) to an 
arbitrary value and run the initial step through as many 
columns as necessary (up to k max ) until it finds the 
required accuracy. Should r(0) be so large that it drives 
the populations negative, it would also be reduced here 
until it reaches a more suitable size for the given prob- 
lem. In addition, if r(0) is still larger than its optimum 
value, it is reduced over the next several steps until it has 
reached this optimum value (and vice versa if it is too 
small). This is the standard approach for the deterministic 
Bulirsch-Stoer method, and it is the one we have taken in 
our simulations. However, in the stochastic regime there 



is another approach: we could set r(0) as some multiple 
of l/<2o( x o) (the expected size of an SSA step in state xn 
[12]), along with an initial guess of the Romberg table col- 
umn to aim for. As l/<2n(xo) is very small, this is a more 
conservative approach, but r is increased to its optimum 
value over the first few steps. It could be useful for sys- 
tems that are very stiff, or that oscillate, whose timestep 
must be very small at certain parts of the solution domain 
and larger timesteps could result in large errors. There 
seems to be no substantial difference in accuracy between 
the two approaches, and we believe both are equally 
valid. 

Conclusions 

Our results have shown that the SBS is generally a very 
accurate method, at least comparable to or, in most cases, 
better than its competitors. However, the real strength 
of the SBS is this accuracy combined with the fact that 
its efficiency curve has a relatively low gradient; in other 
words, it is an accurate method that loses little of its 
accuracy as it is speeded up, allowing for fast, robust 
and accurate simulations. This is because as runtime is 
shortened, the SBS uses more and more extrapolations to 
maintain its accuracy. At the same time, the use of larger 
timesteps means less overhead overall, allowing the SBS 
to be very efficient. It is in such parameter regimes that 
the SBS can achieve its full potential. In addition to this, 
we believe the SBS is also able to achieve high weak order 
(in the moments; the variance remains one) in the small- 
to-moderate noise regime, that is when the number of 
molecules in the system is moderate to large, and also 
when the timesteps are large compared to the noise level. 
Its performance in this regime is accelerated as more and 
more extrapolations are performed, giving it exceptional 
accuracy. 




0 5 10 15 0 5 10 15 

time(sec) time(sec) 



Figure 9 SBS regimes for chain decay example. Evolution in time of (a) stepsize using different initial stepsizes r(0), and (b) the column of the 
Romberg table at which the solutions converge sufficiently (this is not necessarily k, as the error level could be accepted at only k - 1 or even k + 1 ). 
SBS parameters are Si = S 2 = 0.5, a to i = hoi = 1 0 -6 . For clarity not every point has been given a marker. 
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As the SBS is an explicit method, it is not necessarily 
suited for solving especially stiff problems. In such cases, 
Runge-Kutta methods with larger regions of stability, such 
as the stochastic Runge-Kutta method [44], are more ideal, 
as well as implicit or multiscale methods [45-47]. The ini- 
tial stepsize of the SBS should be chosen appropriately, 
as it may be possible for the SBS to settle in a higher- 
stepsize regime, which could affect accuracy, or a low- 
stepsize regime, which could affect runtime. In addition, 
r(0) should be chosen such that it is within the stability 
region of the modified midpoint method. Running a few 
preliminary simulations can help choose r(0). 

In previous work, we have extended Richardson extrap- 
olation into the discrete stochastic regime [22]. In this 
framework, full simulations with fixed stepsize are run 
over t = [ 0, T], and their moments are extrapolated to find 
accurate approximations to the moments at time T. In 
contrast, the SBS uses extrapolation within each timestep 
and varies r to optimise efficiency. Thus the SBS is a com- 
plementary approach to extrapolated r-leap methods that 
has two advantages: first, the stepsize can be adapted to 
lower runtime and eliminate the need for finding a suit- 
able range of fixed stepsizes; second, the SBS returns an 
entire histogram, rather than just the moments. This can 
be desirable in many cases, especially if the solutions do 
not follow a simple distribution such as a Gaussian or 
Poisson, or have multiple stable states. 

In this paper we have introduced a new efficient and 
robust simulation method, the Stochastic Bulirsch-Stoer 
method, which can also achieve higher weak order in the 
moments for certain systems. This is inspired by the deter- 
ministic method of the same name, and as such it also 
boasts the two main advantages of that method: its speed 
and its high accuracy. We have shown using numerical 
simulations that for a range of example problems, it is gen- 
erally the most efficient and robust out of all recent r-leap 
methods that we tested, which are the current state-of- 
the-art in fast stochastic simulation. Thus the SBS is a 
promising new method to address the need for fast and 
efficient discrete stochastic methods. 



Appendix: Stochastic Bulirsch-Stoer full algorithm 

Here we explain in detail the Stochastic Bulirsch-Stoer 
method. The aim of the stepsize adapting mechanism 
of the SBS (shared with the deterministic Bulirsch-Stoer 
method) is to select the optimal column k of the Romberg 
table (Table 1) that will give an acceptably low error while 
requiring as little computational work as possible. We 
define the error of each Romberg column q as 



err a = 



Mr "Mr 



*tol + r tol X Mr 



where /x T = M/CO ls the mean of Kj as defined in Eq. (10), 
which is equivalent to Aa T (t n , t n + r) from Algorithm 4, 
and | v| denotes the L 2 norm of the vector v. The most ideal 
situation is if the error of the /c-th column, err^ = 1: if it 
is larger than one, accuracy has been lost because r was 
too large; if it is smaller than one, computational time has 
been lost because r was unnecessarily small. Below, we 
follow Refs. [24,28] in our exposition. An idea of how r 
can be adjusted to its optimal value for the next step is 
given by 



tq = r Si[ 

err a 



i 

207-D+l 



q=l,...,k, 



where x q is a set of hypothetical new stepsizes adjusted 
from the current stepsize r. S\ and S2 are safety fac- 
tors 0 < S\,S2 < 1, that ensure r is not set too large 
because of errors in the MMP and composite trapezoidal 
rule approximations. 

We want the column that minimises the work done per 
unit step. This is defined for column q as 



w q = 



q=l,2,. 



(14) 



where A q is the work done in computing the q-th Romberg 
table row and is assumed to be the number of function 
evaluations inside the MMR An MMP with stepsize f = 
r/2 needs three evaluations, i.e. A\ = 3 using our scheme; 
this can be generalised to 



A q +i =A q + n q +i, 



where n q = 2q. The optimal column k for the next 
timestep is given by the lowest W q , and the optimal step- 
size by the corresponding r q . In reality, after the initial 
step only columns k — 1, k and k + 1 are tested for con- 
vergence, as otherwise the convergence is likely to be an 
artifact or the timestep is far off its optimal size. This helps 
reduce the runtime but makes the implementation more 
complicated. 

Now that the reasoning behind the adaptive mech- 
anism is clear, we set out a detailed algorithm for 
a practical implementation of the Stochastic Bulirsch- 
Stoer method. To implement the SBS -DA instead, 
Algorithm 4 should be replaced with Algorithm 3, which 
would calculate the mean and variance of Kj accord- 
ing to Eqs. (10) and (11). In addition, there should 
be two Neville tables, one for the mean and one for 
the variance, which find the extrapolated solutions to 
each. 
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Algorithm 5 Stochastic Bulirsch-Stoer method (SBS) 



With the system in state X n at time t n , fixed values for S\, S2, citol an d r tol> an d q — 0: 

1. Compute up to Romberg table column k — 1: 

(a) Setq = q+1. 

(b) Run Algorithm 4 with n q = 2q substeps to find Aa T i (t„, t„ + r). If X n + VjAa- q (t„, t n + r) < 0, set r = r/2 

and redo step by returning to the start of Step 1. Otherwise, add Aa T ^(t n) t n + r) to the end of the first column of 
the Romberg table. 

(c) If q > 1, starting with the first column, extrapolate the final row of each Romberg table column in succession to 
eventually find the first row of column q. Set this as Aa extr (t n , t n + r), the current most accurate estimate for the 
Poisson parameter. 

(d) If q < k — 1, return to Step 1(a); otherwise continue. 

2. Check convergence at column k — 1: 

(a) Find err k -\ using Eq. (14). 

(b) If err^_i < 1, accept step (go to Step 5) and set X try = X n + {Aa extr {t, t + r)), as in Eq. (6), and set ic 
and for the next step as 

'k-% x knew if W k - 2 < 0.8Wi_i 



fc ^-17^- if < 0.9W*_ 2 

/c - 1, otherwise. 



(c) If err k _\ > 1, estimate err k+ i to check for convergence by column k + 1: if err k _\ > (^zT~) ' re j ec t the ste P> 

set ic and x k as in Step 2(b), set q — 0 and return to Step 1. Otherwise, continue. 

3. Compute and check column ic: 

(a) Run Algorithm 4 with n k — 2k substeps to give Aa Tk (t n , t n + x). Add Aa Tk (t n , t n + r) to the end of the first column 
of the Romberg table, and extrapolate to give Aa extr {t n , t n + x). 

(b) Find err k using Eq. (14). If err k < 1, accept step (go to Step 5) and set X try = X n + Y^L\ {^a extr {t, f + t)), 
and set ic and r k for the next step as 



k n ew>X knew — 



k-l, r knew if < 0.8Wit 

/c+1, x k ^- if WJt < 0.9Wit-i 
k, r knew otherwise. 



(c) If err k > 1, estimate err k+ i to check for convergence by column k + 1: if err k > (^p) > reject the step, set ic and 
r k as in Step 3(b), set q = 0 and return to Step 1. Otherwise, continue. 

4. Compute and check column /c+1: 

(a) Run Algorithm 4 with n k +\ — 2(/c + 1) substeps to give Aa Tk+l (t n , t n + x). Add Aa Tk+l (t n , t„ + r) to the end of the 
first column of the Romberg table, and extrapolate to give Aa extr (t n , t n + x). 

(b) Find err k+ i using Eq. (14). If err k+ i < 1, accept step (go to Step 5) and set 

X try = X n + YlfLi v fP (Aa extr (t, t + r)^, and optimal ic and x k for next step as 
■k-l, x knew if < 0.8Wit 



k n ew>X knew — 



/c+1, if Wit+i < 0.9Wit 



/c, r^ ew otherwise, 

(c) If > 1, reject the step, set ic and x k as in Step 4(b), set q = 0 and return to Step 1. Otherwise, continue. 

5. If any species of Xtry is negative, reject the step, set r = r/2, # = 0 and go back to Step 1. Otherwise update t n+ \ =t n + x 
and X n+ \ = X^y, set q = 0 and continue (either return to Step 1 or finish). 
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Additional file 



Additional file 1 : Supplementary information. These show full sets of 
simulation results for all chemical species of all four test systems, using all 
the simulation methods we tested. 
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